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It is our pleasure to congratulate the authors (here- 
after DKSC) on an interesting paper that was a de- 
light to read. While DKSC provide a remarkable 
collection of connections between different represen- 
tations of the Markov chains in their paper, we will 
focus on the "running time analysis" portion. This 
is a familiar problem to statisticians; given a tar- 
get population, how can we obtain a representative 
sample? In the context of Markov chain Monte Carlo 
(MCMC) the problem can be stated as follows. Let 
^ = {Xq, X\, X2, ■ ■ ■} be an irreducible aperiodic 
Markov chain with invariant probability distribu- 
tion 7r having support X and let P n denote the dis- 
tribution of X n I Xq for n > 1, that is, P n (x, A) = 
Pr(X n G A | Xq = x). Then, given u > 0, can we find 
a positive integer n* such that 



(1) 



\P n (x,-)-n(-)\\<u; 



where || • || is the total variation norm? If we can 
find such an n*, then, since \\P n — 7r\\ is nonincreas- 
ing in n, every draw past n* will also be within uj 
of 7r, thus providing a representative sample if we 
keep only the draws after n* . There is an enormous 
amount of research (too much to list here!) on this 
problem for a wide variety of Markov chains. Un- 
fortunately, there is apparently little that can be 
said generally about this problem so that we are 
forced to analyze each Markov chain individually 
or at most within a limited class of models or sit- 
uations. This is somewhat reflected in the current 
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paper since, as noted by DKSC, the techniques in- 
troduced here do not apply to even all of the ex- 
ponential families (with a conjugate prior) in the 
paper. However, DKSC derive some impressive re- 
sults that would seem difficult to improve upon. In 
the rest of this discussion we will review some of 
their findings and compare them to results possi- 
ble via the so-called (by DKSC) "Harris recurrence 
techniques." 

1. FINITE SAMPLE VS. ASYMPTOTIC 

Perhaps the most common use of MCMC by statis- 
ticians is to estimate an expectation with respect to 
7r. More specifically, suppose g : X — ► R and our goal 
is to calculate 
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g(x)7r(dx). 



In typical MCMC settings, this quantity is analyt- 
ically intractable and we estimate it with a sample 
average 
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over the observed path of the Markov chain. Here 
B denotes the burn-in. If we can find n* satisfying 
(1) it would be natural to set B = n* thereby reduc- 
ing the inherent bias in g n ,B but possibly increasing 
its variance compared to using all n + B draws. In 
any case, g n ,B is a useful estimator since as long as 
E^\g\ < oo, we have a strong law: g n ,B — ► E n g with 
probability 1 as n— > oo. 

Of course, no matter what the simulation length 
(i.e. n + B) there will be an unknown Monte Carlo 
error in our estimate, namely g n ,B — E n g. When it 
holds, a Markov chain central limit theorem (CLT) 
provides an approximate sampling distribution of 
the Monte Carlo error as well basis for finding 
the corresponding Monte Carlo standard error; see 
Flegal, Haran and Jones (2008), Jones et al. (2006) 
and Jones and Hobert (2001). 

Thus, we see that there are two questions we want 
to answer in order that we handle the output of the 
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sampler in a sensible fashion. Specifically, what is 
n* and when does a Markov chain CLT hold? At 
first glance these two properties are seemingly un- 
related. That is, finding n* is about a finite-sample 
property of the Markov chain while the existence of 
a CLT is asymptotic. In fact, they are not so differ- 
ent. A key sufficient condition for the existence of a 
Markov chain CLT is that the Markov chain is geo- 
metrically ergodic. That is, there exists M : X — > K. 
and a constant t £ (0,1) such that 

(2) ||P"(x,.)-7r(-)|| <M(x)t n . 

(See Jones (2004), and Roberts and Rosenthal (2004), 
for much more on Markov chain CLTs.) Note that 
as long as the initial value of the simulation is not 
chosen too poorly, that is M(x) is not too large, ge- 
ometric ergodicity ensures rapid convergence. Also, 
if we could find M and t satisfying (2), then a CLT 
would hold as long as E n \g\ 2+S < oo for some 5 > 
and we could easily use (2) to find n*. Unfortunately, 
M and t are rarely available in practically relevant 
settings, so that the best we can hope to do is find 
bounds for them. In the next section we consider a 
constructive method for establishing (2), that is the 
existence of M and t. 

2. DRIFT AND MIIMORIZATION 

A drift condition holds if there exists some func- 
tion V : X —> [0, oo) and constants < 7 < 1 and 
L < 00 such that 

E[V(X l+1 )\X l = x]< 1 V(x)+L 

(3) 

for all x £ X. 

The set C C X is small if there exists a probabil- 
ity measure Q on X and some e > for which the 
following minorization condition holds: 

P(x,A) >eQ(A) 

(4) 

for all x G C and A G B(X). 

If a drift condition holds and the set C = {x : V{x) < 
w} for w > 2L/(1 — 7) is small, then the Markov 
chain is geometrically ergodic. Rosenthal (1995) 
showed that in this case, the drift and minoriza- 
tion conditions can also be used to find a value of n* 
satisfying (I); see also Baxendale (2005), 
Hobert and Robert (2004) and Roberts and Tweedie 
(1999). While Rosenthal's theorem often results in 
conservative values for n* , they can still be useful. 
We will illustrate this in a simple example below; 



see Jones and Hobert (2004) for a practically rele- 
vant example. 

There are at least two other interesting impli- 
cations of drift and minorization: Kendall (2004) 
showed that the existence of drift and minoriza- 
tion imply the existence of a perfect sampling al- 
gorithm and Latuszynski (2008) recently has shown 
that drift and minorization can be used to find a 
simulation length that will guarantee that g n ,B is 
within a user-specified distance of E v g with a user- 
specified probability. 

3. DRIFT FOR EXPONENTIAL FAMILIES 

Following Section 2.3.1 of DKSC, we will assume 
that the distribution of X \ is from one of the 
six exponential families and use the conjugate prior 
for 9. Let m(-) be the marginal density of X . Also, 
let $ = {^o, X\, X2, . . .} denote the x chain for these 
families with one-step and Z-step transition densities 
k x (-) and k x (-), respectively. DKSC construct sharp 
bounds on the total variation distance of the chain 
to stationarity, \\k x — m||. Our goal is to construct a 
drift and associated minorization condition for the x 
chain. As discussed above this will allow us to con- 
clude the Markov chains are geometrically ergodic 
and compare the value of n* given by Rosenthal's 
theorem to that obtained by DKSC. 

As in DKSC, the conditional expectations of X 
and 9 have a special form when 9 is assigned a conju- 
gate prior, specifically, E(X k \ 9) and E(9 k \ X) "are 
polynomials of degree k in 9 and X respectively." 
That is, we can define constants so that 

E(X \9) = a9 + b, 
E(9\X) = fX + g, 

(5) 

E(X 2 I 9) = c9 2 + d9 + e, 
E(9 2 I X) = hX 2 + jX + k. 

Proposition 1. Using the notation defined in 
( 5), assume ch < 1 and define 

df + cj 

u —— 

2(af-ch)' 

Fix 7 G [ch, 1). Then the following drift condition is 
satisfied for the x-chain making transition x — > Y: 

E[V(Y) I x] < jV(x) + L 

where V(Y) = (Y — u) 2 and L = ck + gd + e + u 2 (l- 
ch) — 2u(ga + b). 
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Proof. First, notice that E[V(Y) \ x] = E[(Y - 
u) 2 | x] where 

E[(Y - uf | x] 

= E[E[(Y - uf \9]\x] 

= E[Var[(Y -u)\9] + {E[(Y - u) \ 6]} 2 \ x] 

= E[E(Y 2 | 0) - [E(Y | 9)} 2 + [E(Y | 0) - u} 2 \ x] 

= E[E{Y 2 | 0) - 2uE(Y \ 9) + u 2 \ x]. 

Combining this and (5) with some algebra gives 

E[E(Y 2 | 9) - 2uE(Y \ 9) + u 2 \ x] 

= E[c9 2 + 9(d - 2ua) \ x] + [e - 2ub + u 2 } 

= ch(x — u) 2 + L. 

Therefore, 

E[V(Y) | x] = chV(x) +L< 7 V(x) + L 

and the result holds. □ 

Remark 1 . Proposition 1 holds for each of the 
Beta/Binomial, Poisson/Gamma and Gaussian fam- 
ilies. However, restrictions must be placed on the 
hyperparameters of the remaining three exponential 
families to ensure ch <1. 

Remark 2. We are making no claim that the 
above drift condition is optimal for all (or even any) 
of the exponential families considered in DKSC. In 
fact, it is easy to cook up many more functions for 
which (3) is easily verified, especially if each family 
is considered individually. We prefer drift functions 
satisfying (3) that lead to larger values of e in (4) 
for C = {x : V(x) < w}. 

It is possible to establish a minorization condition 
for the setting so far described. However, we have 
found that exploiting the structure of a particular 
example often leads to a larger value of e. As an 
illustration we will focus on the Gaussian setting in 
the next section. 

4. MINORIZATION IN THE GAUSSIAN 
EXAMPLE 

Consider the Gaussian model where v = and 
a 2 = t 2 = 1/4, that is, 

X 1 0~N(0,l/4) and 0~N(O,l/4). 

From here, it is straightforward to show that 

X~N(0,l/2) and | X ~ N(X/2, 1/8). 



A similar Gibbs sampler is analyzed in example 1 of 
Rosenthal (1995). In this case, a = c = l, b = d = g = 
j = o, e = 1/4, / = 1/2, h = 1/4 and k = 1/8 since 

E(X | 0) = 0, 
E(X 2 |0) = 2 + i, 

E(0 | X) = \X, 

V{9 2 \X) = \X 2 + \. 

Notice that ch = 1/4, u = 0, and L = 3/8 where 
u and L are as defined by Proposition 1. Therefore, 
V(x) = x 2 and the drift condition is 

E[Y 2 | x) < -yx 2 + 3/8 for 76 [1/4, 1). 

Proposition 2 provides an associated minorization 
condition on the compact set C = {x : x 2 < w} where 
w > 0. 

Proposition 2. Define probability density q(-) 
on R by 

where 

9(v) = 

■ «p{~(» + ^[I(y > 0) - l(y < 0)]) 'J 

and 

(6) e = J g(y) dy = 2Pr[z < - J*^) 

for Z ~ N(0, 1) . Then the following minorization con- 
dition holds for the transition density k x of the x 
chain: 

k x (y)>£q{y) forallxeC 

where C = {x : x 2 < w} for w > 0. 

Proof. Let tt(9 \ x) denote the density corre- 
sponding to the conditional distribution of given 
X = x. Recall that the transition density k x {y) can 
be written as 

k x (y) = J MvM9\x)d9 

= [ Jlexp{-2(y-0) 2 } 

J V 7T 

exp{-4(0-x/2) 2 }d0 
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9 / x + v 
2y 2 + x 2 - 6 ' 




^ eXP i"3l y -2 X 



Also, notice that we can rewrite C as 

C = {x : x 2 < w} = {x : —^/w <x< >/w}. 
Therefore, for x G C, 



M»)-^-pH(.-K' 



^ v^i£c exp K( y ~r 



4 f 4 / 1 
^-exp --sup h/--; 



4 

3tt 

■^pl-Uy+^-[I(y>0)-I(y< 
= zq{y)- 

It only remains to show that e satisfies (6). Let Z 
represent a standard Normal random variable. Then 

g{y) dy 



+ 



Pr 



^-exp 
3n 



"3 i» 



2 ' ►* 



z 



> + Pr 



Z + 



<0 



□ 



Now assume w > 2L/(1 — 7) = 3/[4(l— 7)]. Putting 
Propositions 1 and 2 together immediately guar- 
antees the x chain is geometrically ergodic. These 
results also allow us to calculate an upper bound 
on the total variation distance of the x chain to 
stationarity using Theorem 12 in Rosenthal (1995). 



Specifically, for r = 0.1895820 (using Rosenthal's no- 
tation), 7 = 1/4, and w = 2.203030, we have 

(7) 



\k l x - m\\ < 0.952697 z + (1.5 + x 2 )0.9328785 / 



where r, 7 and w were selected by searching over a 
grid of candidate values. Now, suppose our goal is 
to find n* for which ||/cq — m\\ < 0.01. In this case, 
Rosenthal's theorem yields n* = 99 since from (7) 
we have 

||C -m\\ < 0.00980. 

On the other hand, since v = and a 2 + r 2 = 
1/2, we can also apply Proposition 4.8 in DKSC 
to obtain an upper bound on ||/cq — m\\. Mainly, 
Proposition 4.8 gives 



(8) 



k < 



m\\ < 



1 /exp(x 2 2 1 - 2 7(H-2- a )) 



1. 



Vl - 2- 4 ' 

Thus, when the chain is started at X$ = 0, we find 
that n* = 3 iterations are sufficient for the x chain 
to come within 0.01 of the stationary distribution in 
total variation distance. In fact, using (8) we obtain 



\ K o 



mil < 0.00552. 



Clearly, these bounds are sharper than those based 
on drift and minorization. However, the result based 
on drift and minorization is still very reasonable. 
Moreover, it is not clear to us how to apply the 
techniques of DKSC to the setting where the prior is 
not conjugate or perhaps is a mixture of conjugate 
priors. Drift and minorization are flexible enough 
that they are still applicable in these settings. 
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